-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat: add deterministic naming for Qdrant collections #7943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add support for custom collection names via .roo/codebase-index.json - Use git repository URL for deterministic collection naming across worktrees - Fall back to workspace path hash when no git repo is available - Normalize git URLs for consistent hashing - Add comprehensive tests for new naming strategies Fixes #7940
- Add better error handling in normalizeGitUrl method - Remove unused import of getGitRepositoryInfo - Improve URL credential removal logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code because apparently I trust no one, not even myself.
| private normalizeGitUrl(url: string): string { | ||
| try { | ||
| // Remove credentials from HTTPS URLs | ||
| let normalized = url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The credential removal logic here is good, but consider adding a comment explaining why this is important for security (preventing credential leakage in collection names). This helps future maintainers understand the security implications.
| try { | ||
| const configPath = path.join(workspacePath, ".roo", "codebase-index.json") | ||
| if (fs.existsSync(configPath)) { | ||
| const config = JSON.parse(fs.readFileSync(configPath, "utf8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this silently falls back when JSON parsing fails. Should we consider logging malformed JSON at a higher level than console.warn to help users debug configuration issues? Or perhaps add validation for the expected schema?
| const configContent = fs.readFileSync(configPath, "utf8") | ||
|
|
||
| // Extract remote URL | ||
| const urlMatch = configContent.match(/url\s*=\s*(.+?)(?:\r?\n|$)/m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex pattern might not handle all edge cases in git config files (e.g., URLs with spaces or special characters). Have you considered using a more robust git config parser library, or should we add more comprehensive error handling for edge cases?
| * @param workspacePath Path to the workspace | ||
| * @returns Collection name | ||
| */ | ||
| private generateCollectionName(workspacePath: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding JSDoc comments to these new private methods explaining the fallback hierarchy (custom name → git URL → workspace path). This would make the code more self-documenting for future contributors.
| expect((vectorStore as any).vectorSize).toBe(mockVectorSize) | ||
| }) | ||
|
|
||
| it("should use custom collection name from .roo/codebase-index.json if available", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding a few more edge case tests:
- Malformed JSON in .roo/codebase-index.json
- Git repositories with multiple remotes
- Collection name length edge cases (approaching the 255 character limit)
These would help ensure robustness across different user configurations.
| private loadCustomCollectionName(workspacePath: string): string | undefined { | ||
| try { | ||
| const configPath = path.join(workspacePath, ".roo", "codebase-index.json") | ||
| if (fs.existsSync(configPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor performance note: These synchronous file operations could potentially block the event loop. Since this only happens during initialization it's probably fine, but worth noting for future optimization if initialization performance becomes a concern.
|
A possible solution is to let the user create a We’d need to confirm a few things first, like how to handle simultaneous writes to the index file, so it doesn’t break. |
This PR attempts to address Issue #7940 by implementing deterministic naming for Qdrant collections.
Problem
RooCode's Codebase Indexing currently creates Qdrant collections based on absolute file paths, causing:
Solution
This PR implements a three-tier naming strategy for Qdrant collections:
.roo/codebase-index.jsonconfiguration fileKey Changes
.roo/codebase-index.jsonQdrantVectorStoreto use git repository URL for deterministic collection namingBenefits
Testing
Fixes #7940
Feedback and guidance are welcome!
Important
Introduces deterministic naming for Qdrant collections using custom names, git URLs, and workspace paths, with comprehensive testing and improved error handling.
qdrant-client.tsusing a three-tier strategy: custom names from.roo/codebase-index.json, git repository URLs, and workspace paths.qdrant-client.spec.tsto cover new naming logic and edge cases.QdrantVectorStoreto handle new naming logic and improve error handling.This description was created by
for 426e703. You can customize this summary. It will automatically update as commits are pushed.